Automated Labeling from Biomedical Journals published in Foreign Languages

نویسندگان

  • Jongwoo Kim
  • Daniel X. Le
  • George R. Thoma
چکیده

An automated labeling (AL) module is developed to produce bibliographic records such as English title, vernacular title, author, affiliation, and English abstract from biomedical articles published in foreign language journals. Optical character recognition (OCR) output from scanned biomedical journals is used in this labeling process. Since frequently occurring words in a zone are important features, word lists are used as key features in the AL module. The AL module uses geometric and contextual features, and geometric relations between zones, as the basis for the rule-based labeling algorithms in the module. The algorithms uses 131 rules derived for foreign language journals. Experiments conducted with several medical journal articles show about 95% accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Labeling Algorithms for Biomedical Document Images

The National Library of Medicine (NLM) has developed an automated system, named Medical Article Records System (MARS), to process bibliographic data (title, authors, affiliation, abstract, etc.) in biomedical journal articles for its MEDLINE database. This paper describes a labeling module in the MARS, which automatically extract the bibliographic data in biomedical journal articles. The label...

متن کامل

Automated Labeling Of Biomedical Online Journal Articles

An automated labeling (AL) module has been developed to automate the extraction of bibliographic data (e.g., article title, authors, affiliation, abstract, and others) from online biomedical journals for the National Library of Medicine’s MEDLINE database. The AL module employs string matching, statistics, and fuzzy rule-based algorithms to identify segmented zones in an article’s HTML pages a...

متن کامل

Application of the CONSORT statement to randomized controlled trials comparing endoscopic and open carpal tunnel release.

BACKGROUND The CONSORT (Consolidated Standards of Reporting Trials) statement was developed by a group of clinical trialists, biostatisticians, epidemiologists and biomedical editors as a means to improve the quality of reports of randomized controlled trials (RCTs). The purpose of the present study is to assess the reporting quality of published RCTs that compare endoscopic carpal tunnel relea...

متن کامل

Automated Labeling of Zones from Scanned Documents

The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), is developing an automated system, the Medical Article Record System (MARS), to identify and convert bibliographic information from printed biomedical journals to electronic format for inclusion in the MEDLINE database. This paper describes one aspect of ...

متن کامل

Automated labeling of bibliographic data extracted from biomedical online journals

A prototype system has been designed to automate the extraction of bibliographic data (e.g., article title, authors, abstract, affiliation and others) from online biomedical journals to populate the National Library of Medicine’s MEDLINE® database. This paper describes a key module in this system: the labeling module that employs statistics and fuzzy rule-based algorithms to identify segmented ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004